Table recognition in mathematical documents
نویسنده
چکیده
While a number of techniques have been developed for table recognition in ordinary text documents, when dealing with tables in mathematical documents these techniques are often ineffective as tables containing mathematical structures can differ quite significantly from ordinary text tables. In fact, it is even difficult to clearly distinguish table recognition in mathematics from layout analysis of mathematical formulas. Again, it is not straight forward to adapt general layout analysis techniques for mathematical formulas. However, a reliable understanding of formula layout is often a necessary prerequisite to further semantic interpretation of the represented formulae. In this thesis, we present the necessary preprocessing steps towards a table recognition technique that specialises on tables in mathematical documents. It is based on our novel robust line recognition technique for mathematical expressions, which is fully independent of understanding the content or specialist fonts of expressions. We also present a graph representation for complex mathematical table structures. A set of rewriting rules applied to the graph allows for reliable re-composition of cells in order to identify several valid table interpretations. We demonstrate the effectiveness of our technique by applying them to a set of mathematical tables from standard text book that has been manually ground-truthed.
منابع مشابه
An Efficient Recognition and Data Extraction Method for Table-Form Documents
In Asia, many documents processed in offices are table-form documents. Hence the automatic processing of table-form documents is an important issue of the office automation research. In this paper, we propose an efficient representation method for table-form documents. The representation method is based on three types of line segments. The line segments are normalized and sorted, hence the repr...
متن کاملRecognising Tabular Mathematical Expressions Using Graph Rewriting
While a number of techniques have been developed for table recognition in ordinary text documents, very little work has been done on tables that contain mathematical expressions. The latter problem is complicated by the fact that mathematical formulae often have a tabular layout themselves, thus not only blurring the distinction between table and content structure, but often leading to a number...
متن کاملAn Adaptative Recognition System Using a Table Description Language for Hierarchical Table Structures in Archival Documents
Archival documents are difficult to recognize because they are often damaged. Moreover, variations between documents are important even for documents having a priori the same structure. A recognition system to overcome these difficulties needs an external knowledge. Therefore we present a recognition system using an user description. To use table descriptions in analyzing the image, our system ...
متن کاملImage Registration and Text Recognition for Structured Census Documents
In this paper, we present our work on developing a system for registration and recognition of structured census documents. Information extraction from these documents present many challenges, for instance, table registration, cell extraction, binarization, and recognition of handwritten text. This paper mainly deals with table registration. It details the approach and algorithms we developed fo...
متن کاملExtraction of Logical Structure from Articles in Mathematics
We propose a mathematical knowledge browser which helps people to read mathematical documents. By the browser printed mathematical documents can be scanned and recognized by OCR (Optical Character Recognition). Then the meta-information (e.g. title, author) and the logical structure (e.g. section, theorem) of the documents are automatically extracted. The purpose of this paper is to show the ex...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015